Porting Tools.h++ to Object Store

Porting Tools.h++ to ObjectStore

Written by Jim Shur, Rogue Wave Software, Inc.

© Copyright Rogue Wave Software, Inc. 1994.

Table of Contents

Introduction

This report describes the port of Rogue Wave's Tools.h++ foundation class library to Object Design's object-oriented database system, ObjectStore. The report should answer questions such as, "What does it mean to port Tools.h++ to ObjectStore?", "What was the impact of the port on the existing Tools.h++ source code?", and "What will it take to port an existing application that currently links to the standard version of Tools.h++?". More generally, we present several issues that one is likely to face when porting a C++ class library for use with ObjectStore.

ObjectStore Basics

ObjectStore is an object database management system (ODBMS). Touted by some as the successor to relational database management, ODBMSs provide support for traditional database features such as persistent data, transaction processing, query facilities, and concurrency and access control. Unlike their relational predecessors however, ODBMSs directly support the persistence of objects - in the sense of programming languages such as C++ and Smalltalk - as such, without the programmer having to somehow map the object's contents and behavior into the rows and columns of a relational system. This feature makes ODBMSs particularly well suited for applications that maintain data in complex relationships, where access is best achieved by following pointers between related objects, so called navigational access. According to the ObjectStore User Guide, computer-aided design, publishing applications, medical imaging, seismic analysis, and molecular modeling are all examples of systems which make use of the sort of non-record-oriented data which is better handled by object databases than relational databases.

The aspect of ObjectStore most relevant to this project is its persistence mechanism. A persistent object is one which is placed in non-volatile storage so as to outlive the process which created it. Persistent objects live in a database, and can be retrieved and manipulated at any time by a process with appropriate access. In keeping with the goals of the emerging ODBMS standard, ObjectStore implements as much of its functionality as possible within the C++ language. This is achieved by providing a class library encapsulating such concepts as databases , transactions, and queries, as well as several overloaded forms of C++'s operator new. This latter feature is central to creating persistent objects with ObjectStore. The basic form of ObjectStore's overloaded operator new contains three parameters which allow the programmer to answer three questions concerning the object he or she is creating:

  1. Where --- In what database will the object live?
  2. What --- What sort (type, or class) of object is it?
  3. How many --- How many are being stored (for arrays)?

To answer the first question, the programmer provides a pointer to an instance of ObjectStore's os_database class. Objects of another ObjectStore provided class, os_typespec, tell the persistence mechanism what type of object is being stored. The third parameter is optional, defaulting to 1. Programmers use this parameter when allocating persistent arrays of objects.

os_database* students_db;
os_typespec* student_type;
// ...
// create a student in the students_db database:
student_ptr = new(students_db, student_type) Student("Louis Witt");  
// Create an array of 10 students in the students_db database:
student_group = new(students_db, student_type, 10) Student[10];

Note that pointers to objects that live in databases, (like student_ptr and student_group above), do not require special handling. In most cases, they can be used just as pointers to transient objects on the heap. This includes deleting the object; there is no special operator delete corresponding to the overloaded operator new.

// remove student "Louis Witt" from the students_db database.
delete student_ptr;

ObjectStore databases are further divided into segments, and another form of operator new allows the programmer to specify in which segment within a database to place a persistent object. As segments are generally the unit of transfer from persistent storage to program memory, programmers can increase efficiency by placing objects that exhibit locality of reference into the same segment. A static member function of the os_segment class can be used to discover the segment of an object.

// create a student in the students_db database:
student1 = new(students_db, student_type) Student("Bert Russ");  
// create a new student in the same segment of students_db as Bert Russ:
os_segment* seg = os_segment::of(student1);
student2 = new(seg, student_type) Student("Rudy Carn");
  

Goals of the port

Scope

The scope of this project is to allow instances of nearly all Tools.h++ classes to be allocated into persistent storage via ObjectStore's overloaded new operator. The overriding design goal toward this effort is for the ObjectStore version to stay as close to the standard version of Tools.h++ as possible, that is:

Platforms

The library was ported to the SunOS 4.x version of ObjectStore using the supplied OSCC compiler. This is the configuration certified by Rogue Wave. Customers have since taken the product to other platforms including HP, SGI, and Windows NT. Compilers other than OSCC have been used for code generation.

Library Porting Issues

Heap allocation from within class member functions

By far, the majority of the work to port the Tools.h++ library to ObjectStore involved modifying heap allocations from within class member functions. Consider, for example, this class which maintains a vector of ints, the size of which can only be determined when an instance of the class is created:

class SimpleVec {
public:
  int  at(size_t i);              // return the i'th element
  void put(size_t i, int value);  // set the i'th element to value
  void reshape(size_t n);         // Resize the vector
  SimpleVec(size_t n);            // Construct vector of lenght n
  SimpleVec& operator=(const SimpleVec&);
private:
  int* vec_;
  size_t  elements_; 
};
SimpleVec::SimpleVec(size_t n) : elements_(n)
{
  vec_ = new int[n];
}

Note the heap allocation in the SimpleVec constructor. Now imagine that a user of the class tries to create an instance of SimpleVec in persistent storage:

os_database* db;
os_typespec* simplevec_type;
SimpleVec* v;
// ...
v = new(db, simplevec_type) SimpleVec(10);

The SimpleVec object itself will be allocated into the persistent storage of database db. But what about the array of integers that vec_ is pointing to? It will be allocated into transient memory, only to disappear as soon as the current process ends. The next program that tries to access the db database and retrieve the SimpleVec, will have an object with pointer vec_ pointing to nonsense. The real information being maintained by this object is long since gone. To port this class to ObjectStore, we must alter the constructor so that it allocates the vector of integers into either persistent or transient storage as appropriate. Recall that ObjectStore makes it easy to place objects which are likely to be referenced together into the same segment. This certainly applies to an instance of SimpleVec and the integer array pointed to by its member vec_. The following version of the SimpleVec constructor ensures that an instance of SimpleVec share a segment with its component array.

SimpleVec::SimpleVec(size_t n)
{
  	// allocate an int vector in the same segment as self:
	vec_ = new(os_segment::of(this), os_typespec::get_int(), n) int[n];
}

Note that if a SimpleVec is allocated into transient memory, the call to os_segment::of will return the "transient segment," and vec_ will point to transient heap memory as we would hope. (The static member function get_int(), of class os_typespec, returns a pointer to an os_typespec object corresponding to the int data type. ObjectStore provides similar functions for each of the built-in data types as a means to obtain efficiently these oft-used objects.)

It is not enough to search constructors for heap allocations; all member functions must be examined. For example, the resize function must create a new array if the size is increased, and of course this array needs to be allocated appropriately. Here is how the method looks after porting to ObjectStore; only the call to operator new is affected:

void SimpleVec::resize(size_t newSize)
{
  int i, *temp;
  if (newSize > elements_) {
    temp = new(os_segment::of(this), 
               os_typespec::get_int(),  // create a new, larger vector
               n) int[n];               //   in the same segment as self
    for (i=0; i<elements_; i++)
      temp[i] = vec_[i];                // copy each element to new vector
    delete vec_;                        // remove old vector from memory 
						    //   (or database if persistent)
    vec_ = temp;                        // replace old vector with new vector
  }
  elements_ = newSize;                  // set number of elements to new size
}

Porting Tools.h++ and its applications

Approximately 40 classes in Tools.h++ (about 1/3 of the total) contained one or more methods which had to be ported in this way. This accounted for nearly 170 instances of calls to the default operator new which were replaced by calls to ObjectStore's overloaded operator new. These changes do not require any changes to port programs currently linked to the standard version of the library to instead use the ObjectStore version. Instances of Tools.h++ classes being allocated on the heap will continue to be stored in transient memory without modification to the code. Of course, if part of the porting process involves storing objects persistently in an ObjectStore database, the programmer will have to modify the relevant calls to operator new.

  

Copy on write

One nice thing about using a class library, is that its authors, entertaining glorious visions of code-reuse, are willing to take extra time to implement all those clever strategies that can dramatically increase efficiency. One of these is known as copy on write. This technique extolls the virtues of being lazy and putting things off until absolutely necessary, based on the presumption that sometimes they won't be. Imagine assigning one SimpleVec to another.

SimpleVec v1(30);
SimpleVec v2(10);
// ...
v2 = v1;

Following the assignment, v2 should itself be a vector of length thirty, containing the identical values at each element as v1. The high cost of this operation is copying each of the thiry integers. It would be much more efficient to simply copy the array pointer vec_, after which, v2 will contain the same values as v1. Of course the problem there is that, contrary to programmer expectations, any change to an element of either v1 or v2 would effectively be a corresponding change in the other as well. With the copy on write technique the individual elements are copied only when necessary. With some extra effort by the library developer, the integer arrays can be shared among multiple SimpleVec objects, avoiding all that copying --- at least until such time as it becomes necessary to preserve the copy semantics of the assignment operator. This can be achieved by making the vec_ member a pointer to an instance of a helper class, say, SimpleVecRef, that maintains not only an array of integers, but also a reference count denoting the number of SimpleVec objects sharing the same elements.

SimpleVec& SimpleVec::operator=(const SimpleVec& svec)
{
  elements_ = svec.elements_;
  vec_->removeReference();
  if (vec_->refCount == 0) delete vec_;
  vec_ = svec.vec_;
  vec_->addReference();
  return *this;
}

When porting to ObjectStore one must constantly think about where objects live --- in transient storage? In a database? If so, then which database? The method above is very efficient, but it results in two different objects that contain pointers to the same SimpleVecRef. If an object in transient storage is assigned to an object in a database, the latter object will have a member pointing to transient storage. Similarly, if an object from one database is assigned to an object in a different database, the latter object will have a member pointing into a foreign database. This is permitted by ObjectStore under certain circumstances, but it links the two databases in way probably not intended by the programmer. Not only that, but contrary to the spirit of copy on write as a "behind the scenes" implementation technique, the programmers and database administrators would have to be aware of this link. If not they might purge one of the databases not realizing the dependence on it of the other. Thus, in ObjectStore, the copy on write technique is only appropriate when assigning between two objects of the same database. In the interest of efficient clustering, we further check to make sure that both objects are in the same segment before employing copy on write:

SimpleVec& SimpleVec::operator=(const SimpleVec& svec)
{
  elements_ = svec.elements_;
  vec_->removeReference();
  if (vec_->refCount() == 0) delete vec_;
  if (os_segment::of(this) == os_segment::of(&svec)) {
    // OK to share the array--assign pointer and adjust ref count:
    vec_ = svec.vec_;
    vec_->addReference();
  }
  else {
    // Not in same segment. Don't share; instead, copy each element:
    vec_ = new SimpleVecRef(svec.vec_);
  }  
  return *this;
}

Porting Tools.h++ and its applications

Three classes in Tools.h++ --- both the character and wide-character string classes, RWCString and RWWString, as well as RWTValVirtualArray - use the copy on write technique, and had to be modified accordingly. These changes do not affect programmers porting applications to the ObjectStore version of the library.

Sharing static objects

Sometimes the library designer knows that a particular class will be instantiated repeatedly with the same constructor arguments. As long as this object is safe to share, for example a TimeZone object representing the local timezone, the library can simply provide one in its data segment. If, say, a TimeTable class contains a member pointer to a TimeZone object, and the majority of TimeTable objects created are for the local timezone, the program can avoid creating the same TimeZone object over and over again by simply pointing to the one supplied by the library. As with the copy on write technique, this sharing can lead to trouble. If a programmer puts a TimeTable object into persistent storage, it is important that it not have a member pointing into the transient storage of the library's data segment. The constructor below checks to see if the object is being created in transient memory, and if so, makes use of the static data. If not, it reverts to creating a new object.

static TimeZone localTimeZone;
TimeTable::TimeTable(TimeZone* tz = NULL)  // Null means use local timezone
{
  //...
  if (tz == NULL) {
    if (os_segment::of(this) == os_segment::get_transient_segment())
      // Use handy TimeZone object above:  
      zone_ = &localTimeZone;
    else
      // Persistent allocation; Can't point to data segment
      zone_ = new(os_segment::of(this), timezone_type) TimeZone;
  }
  else
    zone_ = tz;
}

Porting Tools.h++ and its applications

Both the character and wide-character string classes, RWCString and RWWString, use this technique and had to be modified accordingly. These changes do not affect programmers porting applications to the ObjectStore version of the library.

Storing pointers to functions

Sometimes classes contain members that point to functions. For example, in Tools.h++ there are several template-based collections which require the programmer to supply a pointer to a hash function. Should one of these collections be stored in an ObjectStore database, the pointer to the hash function will be a pointer to transient memory. Should that same collection be retrieved from the database in another process, the hash-function pointer will have a meaningless value, and as soon as a member function of the collection tried to hash something, a program crash would likely result. Since functions are not objects which can be stored in a database, classes which contain function pointers as members need to be re-engineered if they are to be candidates for persistent storage.

The Tools.h++ library class template RWTValHashTable<T> is a parameterized hash table of types T. Through the constructor, the objects maintain a pointer to a function which takes a const T& as an argument and returns an unsigned as the hash value.

unsigned fooHash(const Foo& item)
{
  // ... calculate and return the hash value of a Foo
}
// Create a hash table of Foos which uses fooHash for hashing:
RWTValHashTable<Foo> fooTable(fooHash);

Rogue Wave's solution to this problem is to supply a new abstract class template called RWTHashFun, whose purpose is to effectively wrap up a hash function into an object which can then be stored in an ObjectStore database.

template <class T>
class RWTHashFun {
  virtual unsigned hashFun(const T&) const = 0;
public:
  unsigned operator()(const T& p) const { return hashFun(p); }
};

RWTValHashTable is modifed to use a pointer to an RWTHashFun object instead of a hash function. It is up to the creator of an RWTValHashTable to subclass RWTHashFun and override the pure virtual function so that it computes the hash value. By overloading operator(), we can use the hash object as we would a hash function. This is especially nice in relation to our goal of minimizing modifications to the Tools.h++ source code. The same code that worked to retrieve a hash value via a direct call to a hash function, will work just as well via operator() on the hashing object.

class FooHasher : public RWTHashFun<Foo>
{
  virtual unsigned hashFun(const Foo&);
}
unsigned FooHasher::hashFun(const Foo& item)
{
  // ... calculate and return the hash value of a Foo
}

// Create a hashing object in the database db:
FooHasher* fooHash = new(db, foohasher_type) FooHasher;
// Create a persistent hash table of Foos which uses fooHash for hashing:
RWTValHashTable<Foo>* fooTable = 
              new(db, hashtable_type) RWTValHashTable(fooHash);

Now any proceess can retrieve the hash table from database db, and the table's hashing function will be intact and ready to hash. Of course the retrieving process must include the overridden hashFun somewhere in the executable, but we can leave it to ObjectStore to take care of the messy business of "vtable relocation," that is, supplying the appropriate virtual table pointer to objects as they are read in from persistent storage.

Porting Tools.h++ and its applications

Several template-based collections in Tools.h++ had to be modified in this way, including RWTValHashTable, RWTPtrHashTable, RWTValHashDictionary, RWTPtrHashDictionary, RWTValHashSet, and RWTPtrHashSet.

This is the only place where the type or number of parameters in the functions and methods of the public interface differs between the ObjectStore and standard versions of Tools.h++. Programmers porting an existing application that contains one of the above classes will have to modify calls to their constructors by providing a pointer to an instance of a subclass of RWTHashFun instead of a pointer to the hash function itself. These changes must be made whether or not instances will be stored persistently.

What's my typespec?

Template-based collections exact another compilication when porting to ObjectStore. Let's consider a parameterized version of the first incarnation of our SimpleVec class, its constructor in particular. Instead of an array of ints, the constructor will allocate an array of some type T, as instantiated by the user of the class.

template <class T>
SimpleVec<T>::SimpleVec(size_t n)
{
  	// allocate a vector of T-s in the same segment as self:
	vec_ = new(os_segment::of(this), ?????, n) T[n];
}

The question marks in the constructor point out the problem here. How can we supply an os_typespec for type T when we don't know what type that is? The os_typespec constructor takes a string containing the name of the type, so we can't simply create one at runtime.

// This won't work!
os_typespec T_type = new os_typespec("T");

The template preprocessor won't expand the T inside the string to the appropriate type name. A solution here is to add a static member T_typespec of type os_typespec* to the template, and rely on the instantiator of the template --- the programmer using the library --- to provide the definition. The linker will help if he or she forgets, so no nasty surprises at run-time.

class SimpleVec {
public:
  //... public part
  static os_typespec* T_typespec;
private:
  //... private part
};
template <class T>
SimpleVec<T>::SimpleVec(size_t n)
{
  	// allocate an array of T-s in the same segment as self:
	vec_ = new(os_segment::of(this), T_typespec, n) T[n];
}

The programmer using the class would do the following:

// Define the typespec for my instantiation of SimpleVec:
SimpleVec<Foo>::T_typespec = new os_typespec("Foo");
// Store an array of ten Foos in database db:
SimpleVec<Foo>* fooVec = new(db, foo_type, 10) SimpleVec<Foo>(10);
	

Porting Tools.h++ and its applications

The RWTValVector class in the Tools.h++ library was modified to include the static member T_typespec.

Programmers porting an application which uses this class, or either of its subclasses (RWTValOrderedVector and RWTValSortedVector) will have to supply a definition for T_typespec, for each different instantiation of the class. Note that the programmer must provide the definition whether or not instances of RWTValVector will be persistent.

What's my typespec II?

Sometimes, the member function of a template class allocates storage for a helper class instantiated on the same type. For instance a class LinkedList, parameterized by class T, might create an instance of a class Link, intantiated with T. We assume for this example that the linked-list class is pointer based, that is, it stores pointers to objects of type T, not copies of them. Unlike in the previous section, the library is not responsible for creating objects of type T in persistent storage; the programmer using the library will do that. However, once again the fact that we can't know what type T will be makes it impossible to create the instance of an os_typespec necessary to store an object of class Link<T>. Fortunately, since the Link template is of our own creation, we don't this time have to put the burden on the user of the library to supply the os_typespec. ObjectStore allows any class to include a static member function get_os_typespec, which returns a pointer to an os_typespec for the class. Only the declaration need be provided, ObjectStore itself will provide the definition via the build process for the application.

template <class T>
class Link {
  // ...
public:
  //...
  static os_typespec* get_os_typespec();
};

The static member function can then be used to obtain the appropriate os_typespec at run-time.

template <class T>
void LinkedList<T>::insert(T* item)
{
  // Create a link object holding item in the same
  //    segment as self:
  Link<T>* link = new(os_segment::of(this),
                      Link<T>::get_os_typespec())
                  Link<T>(item);
  //...
}

Porting Tools.h++ and its applications

Many of the templates provided by the library make use of other parameterized classes, necessitating the declaration of the get_os_typespec member functions. Thinking ahead to those using the library, however, makes us realize that any programmer writing a template class which contains a pointer to an instance of a template-based class from Tools.h++, may require that class to include the get_os_typespec member function. Thus, all of the template-based classes were modified to include a declaration for this static member function.

The impact of this change, on the programmer porting existing Tools.h++ applications to the ObjectStore version, does not involve modifying the program itself, but requires that the programmer include an application schema source file, used to generate the application schema database (all part of the ObjectStore magic, details beyond the scope of this report; see the ObjectStore documentation). The schema source file marks any classes whose instances the application may store into or read from persistent memory. It is during schema-database generation that ObjectStore creates the definitions of any get_os_typespec member functions. Without this step these functions would be flagged as undefined by the linker. Even if the programmer does not intend to store instances of the template-based classes in an ObjectStore database, he or she must mark classes as persistent.

Library schema file

Similar to the application schema file, libraries have an associated library schema file, marking any classes that the library uses in a persistent context. When an application links to the library, the necessary information from the library schema database is incorporated into the application schema database. To save the programmer from marking Tools.h++ classes used in their application, all non-template classes from the library are included in the Tools.h++ library schema file.

Classes not ported

Not all classes in the Tools.h++ library were ported to be candidates for storage into the peristent memory of an ObjectStore database. Some Tools.h++ classes contain member pointers to instances of classes from the iostream library --- e.g. istream, ostream, and streambuf. As the iostream library has not itself been ported to ObjectStore, these objects are not suitable for persistent storage. For instance, a streambuf object may dynamcially allocate an array of characters. Even if the streambuf was allocated persistently, it wouldn't know to allocate its component array that way. Thus, none of the Rogue Wave stream classes were ported. Similarly, other Tools.h++ classes make use, directly or indirectly, of FILE structs; these were not ported either. Finally, some classes were not ported simply because we felt that it would be unlikely for programmers to use them in a persistent context.

Although the following classes were not ported, they can still be used in the transient memory of an ObjectStore application:

Testing and maintainance

The ObjectStore version of the Tools.h++ library should offer all the functionality of the standard version, with the additional capability of persistent storage. Thus it is important that it successfully build and execute the Tools.h++ test suite, after the minor porting of the test programs as suggested in Section 4 above.

Beyond that, through sample applications, and programs which exercise the persistence capability of all suitable classes, we can verify that objects can successfully be written to and read from persistent storage.


© Copyright 1995, Rogue Wave Software, Inc.